Skip to content

fix(api_server): require CallbackToken for Merkle callback endpoint (#76)#112

Merged
galt-tr merged 3 commits into
mainfrom
fix/issue-76-callback-endpoint-auth
May 1, 2026
Merged

fix(api_server): require CallbackToken for Merkle callback endpoint (#76)#112
galt-tr merged 3 commits into
mainfrom
fix/issue-76-callback-endpoint-auth

Conversation

@galt-tr
Copy link
Copy Markdown
Contributor

@galt-tr galt-tr commented May 1, 2026

Summary

  • handleCallback always validates the bearer; an empty CallbackToken no longer disables the check.
  • Config validation rejects empty CallbackToken when MerkleService.URL is configured.
  • Token compare uses subtle.ConstantTimeCompare to remove the timing side channel.
  • Tests cover missing/wrong/empty-config bearer paths plus the new config validation rule, and the standalone-mode (MerkleService.URL == "" + CallbackToken == "") path from [F-001] documented standalone configs cannot start without a Merkle service #59 keeps validating cleanly.

Closes #76

Operational note

Existing deployments where MerkleService.URL is set but CallbackToken is empty will fail to start after this change. They were already silently accepting forged callbacks; the fail-fast validation surfaces the misconfiguration.

Test plan

  • go build ./...
  • go vet ./...
  • go test ./services/api_server/... ./config/... -race
  • golangci-lint run ./services/api_server/... ./config/... (0 issues)
  • Reviewer to confirm the deploy-time validation rule is the right shape

)

handleCallback no longer skips bearer-token validation when
CallbackToken is empty; it now always demands a valid bearer.
Config validation rejects an empty CallbackToken when MerkleService
is configured, so operators can't accidentally deploy an unauthenticated
callback receiver. Token comparison uses constant-time compare to
remove the timing side channel. Closes F-018.
@galt-tr galt-tr requested a review from mrz1836 as a code owner May 1, 2026 18:41
@github-actions github-actions Bot added size/L Large change (201–500 lines) bug-P3 Lowest rated bug, affects nearly none or low-impact labels May 1, 2026
@galt-tr galt-tr merged commit a00bbd6 into main May 1, 2026
45 checks passed
@galt-tr galt-tr deleted the fix/issue-76-callback-endpoint-auth branch May 1, 2026 19:21
galt-tr added a commit that referenced this pull request May 2, 2026
PR #121 (issue #91) added TestHandleCallback_UnknownTxid_NoPhantomRow
without the bearer header that PR #112 (issue #76) made mandatory on
the callback endpoint, so the test was failing on every PR rebased
onto main. Use the existing authedCallbackRequest helper.
galt-tr added a commit that referenced this pull request May 2, 2026
PR #121 (issue #91) added TestHandleCallback_UnknownTxid_NoPhantomRow
without the bearer header that PR #112 (issue #76) made mandatory on
the callback endpoint, so the test was failing on every PR rebased
onto main. Use the existing authedCallbackRequest helper.
mrz1836 pushed a commit that referenced this pull request May 2, 2026
* fix(api_server): cap callback request body size (#77)

handleCallback binds JSON without limiting body size, allowing a
malicious or malfunctioning peer to exhaust memory with oversized
payloads (especially STUMP blobs). Bodies are now wrapped in
http.MaxBytesReader; oversize requests return 413. The limit is
configurable via Callback.MaxBodyBytes (default 16 MiB). Closes F-019.

* test(api_server): authenticate the unknown-txid callback test

PR #121 (issue #91) added TestHandleCallback_UnknownTxid_NoPhantomRow
without the bearer header that PR #112 (issue #76) made mandatory on
the callback endpoint, so the test was failing on every PR rebased
onto main. Use the existing authedCallbackRequest helper.
mrz1836 pushed a commit that referenced this pull request May 2, 2026
…#87) (#123)

* fix(bump_builder): preserve block height when publishing mined status (#87)

The mined-status update path was dropping blockHeight between
InsertBUMP and SetMinedByTxIDs (or the subsequent publish), so
downstream consumers received MINED statuses with a zero or unset
height. Block height is now threaded through the whole call chain
and asserted in tests. Closes F-029.

* test(api_server): authenticate the unknown-txid callback test

PR #121 (issue #91) added TestHandleCallback_UnknownTxid_NoPhantomRow
without the bearer header that PR #112 (issue #76) made mandatory on
the callback endpoint, so the test was failing on every PR rebased
onto main. Use the existing authedCallbackRequest helper.
galt-tr added a commit that referenced this pull request May 3, 2026
…tration (#131)

Arcade's /api/v1/merkle-service/callback requires bearer-token auth via
cfg.CallbackToken (PR #112 / F-018), but the /watch registration to
merkle-service didn't tell merkle-service what token to send back.
Result: merkle-service calls arcade with no Authorization header and
gets 401.

Register/RegisterBatch now accept a callback-token argument and emit
it in the watchRequest JSON. The propagator passes cfg.CallbackToken
through. Empty tokens omit the field so older deployments without a
configured token continue to work as today (with the same 401 they
already get from the inbound check, which is the correct fail-closed
behaviour).

Pairs with merkle-service PR <leave-placeholder> which makes the
receiving end actually store and forward the token on outbound
delivery. Both PRs must merge before authenticated callbacks work.
galt-tr added a commit that referenced this pull request May 12, 2026
merkle-service #112 shipped a POST /reprocess endpoint that re-drives a
block through STUMP + BLOCK_PROCESSED delivery for a single requesting
arcade — the operator-facing recovery hatch when arcade missed the
original BLOCK_PROCESSED event. The new smoke scenario proves the
recovery path lands a valid compound BUMP at arcade:

  1. arcade registers 10 watched txids via /watch
  2. NO libp2p BlockMessage is ever published, so merkle-service
     never processes the block live (simulates "arcade missed the
     BLOCK_PROCESSED event")
  3. test sanity-checks none of the txs reached MINED
  4. test calls POST /reprocess with the block hash + arcade's
     callback URL + token
  5. merkle-service probes its DATAHUB_FALLBACK_URLS, finds the
     harness datahub, runs the full block-processing pipeline
     with override callback URL/token + BypassDedup=true so
     subtree-worker emits STUMPs filtered to arcade and a single
     BLOCK_PROCESSED scoped to arcade's callback
  6. arcade's bump-builder builds the compound BUMP, marks all 10
     txs MINED, and each merklePath ComputeRoot()s back to the
     block's real header merkle root

Harness additions:

- harness.New() learns a WithReprocessReady() option. When set, it
  pre-picks a free TCP port, builds the datahub URL on the network
  gateway IP, and threads it into the merkle-service container's
  DATAHUB_FALLBACK_URLS env var BEFORE the container starts. The
  Datahub listener binds on that pre-reserved port after the
  container is up. h.Datahub is the resulting pre-registered datahub.

- harness.NewDatahubOnPort(t, opts, port) — datahub constructor
  that accepts an explicit port. port=0 keeps the auto-pick
  behavior NewDatahubWith/NewDatahub already had.

- harness.TriggerReprocess(ctx, merkleHostURL, blockHash, callbackURL,
  callbackToken) — POSTs to merkle-service /reprocess, parses the
  202 body into ReprocessResponse{Status, BlockHash, DataHubURL}.
  Returns the parsed response so callers can assert which datahub
  URL got picked up.

No new fixtures — the test reuses the existing single-subtree
mainnet block fixture from the round-trip test.

Like the other end-to-end tests, this one requires container ↔ host
networking that works on Docker but not on rootless podman with
pasta's --no-map-gw default. The /reprocess call fails locally with
"no DataHub could serve the requested block" — documented in
tests/e2e/README.md. CI on GitHub Actions uses Docker so the test
runs the full path there.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-P3 Lowest rated bug, affects nearly none or low-impact size/L Large change (201–500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[F-018] Merkle callback endpoint is unauthenticated by default

2 participants